Spatial-Temporal Semantic Grouping of Instructional Video Content

نویسندگان

  • Tiecheng Liu
  • John R. Kender
چکیده

This paper presents a new approach for content analysis and semantic summarization of instructional videos of blackboard presentations. We first use low-level image processing techniques to segment frames into board content regions, regions occluded by instructors, and irrelevant areas, then measure the number of chalk pixels in the content areas of each frame. Using the number of chalk pixels as heuristic measurement of video content, we derive a content figure which describes the actual rather than apparent fluctuation of video content. By searching for local maxima in the content figure, and by detecting camera motions and tracking movements of instructors, we can then define and retrieve key frames. Since some video content may not be contained in any one of the key frames due to occlusion by instructors or camera motion, we use an image registration method to make “board content images” that are free of occlusions and not bound by frame boundaries. Extracted key frames and board content images are combined together to summarize and index the video. We further introduce the concept of “semantic teaching unit”, which is defined as a more natural semantic temporalspatial unit of teaching content. We propose a model to detect semantic teaching units, based on the recognition of actions of instructors, and on the measurement of temporal duration and spatial location of board content. We demonstrate experiments on instructional videos which are taken in non-instrumented classrooms, and show examples of the construction of board content images and the detection of semantic teaching units within them.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatio-Temporal Grouping Models and Prominence Models in Perceptual Organization for Semantic Interpretation of Video Shot

We focus on the problem of video shot interpretation by making use of perceptual grouping principles on the visual primitives (2D blobs) in a video shot. We present a novel scheme for modeling the homogeneous regions in the form of 2D blobs, that can be tracked easily across the frames. We describe a novel spatio-temporal perceptual grouping scheme, applied on blobs, that makes use of specified...

متن کامل

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

A Survey of Spatio-Temporal Grouping Techniques

Spatio-temporal segmentation of video sequences attempts to extract backgrounds and independent objects in the dynamic scenes captured in the sequences. It is an essential step of video analysis. It has important applications in video coding, video logging, indexing and retrieval, and more generally in scene interpretation and video understanding. We classify spatio-temporal grouping techniques...

متن کامل

A New Wavelet Based Spatio-temporal Method for Magnification of Subtle Motions in Video

Video magnification is a computational procedure to reveal subtle variations during video frames that are invisible to the naked eye. A new spatio-temporal method which makes use of connectivity based mapping of the wavelet sub-bands is introduced here for exaggerating of small motions during video frames. In this method, firstly the wavelet transformed frames are mapped to connectivity space a...

متن کامل

Rule-based semantic summarization of instructional videos

We present a new content-based approach to summarize instructional videos. We first redefine “scene” in instructional videos. Focusing on one dominant scene type, that of handwritten lecture notes, we define semantic content as “ink pixels”, and present a low-level retrieval technique to extract this content from each frame with consideration of various occlusion and illumination effects. “Key ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003